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ABSTRACT 

Jsolated DNA encoding the enzyme I-Scel is provided. 
The DNA sequence can be incorporated in cloning and 
expression vectors, transformed cell Lines and transgenic 
animals. The vectors are useful in gene mapping and site- 
directed insertion of genes. 
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2670 

ATG CAT ATG AAA AAC ATC AAA AAA AAC CAG GTA ATG C2672J 

MHMKNIKKMQVM 12 

2671 AAC CTC GGT CCG AAC TOT AAA CTG CTG AAA GAA TAG AAA TCC CAG CTG ATC GAA CTG AAC 2730 

laWLGPKSKLLKEYKSQLIELN 32 

2731 ATC GAA CAG TTC GAA GCA GGT ATC GGT CTG ATC CTG GGT GAT GCT TAC ATC CGT TCT CGT 2790 

33IEQFEAGIGLILGDAYIRSR 52 

2791 GAT GAA GGT AAA ACC TAC TGT ATG CAG TTC GAG TGG AAA AAC AAA GCA TAC ATG GAC CAC 2850 

53DEGKTYCMQFEWKHKAVMDH 72 

2851 GTA TGT CTG CTG TAC GAT CAG TGG GTA CTG TCC CCG CCG CAC AAA AAA GAA CGT GTT AAC 2910 

73VCLLYDQWVLSPPHKKERVM 92 

2911 CAC CTG GGT AAC CTG GTA ATC ACC TGG GGC GCC CAG ACT TTC AAA CAC CAA GCT TTC AAC 2970 

93HLGNLVITWGAQTFKHQAFN 112 

2971 AAA CTG GCT AAC CTG TTC ATC GTT AAC AAC AAA AAA ACC ATC CCG AAC AAC CTG GTT GAA 3030 

113KLANLFIVNNKKTIPNNLVE 132 

3031 AAC TAC CTG ACC CCG ATG TCT CTG GCA TAC TGG TTC ATG GAT GAT GGT GGT AAA TGG GAT 3090 

133NYLTPMSLATWFMDDGGKWD 152 

3091 TAC AAC AAA AAC TCT ACC AAC AAA TCG ATC GTA CTG AAC ACC CAG TCT TTC ACT TTC GAA 3150 

153YHKNSTNKSIVLHTQSFTFE 172 

3151 GAA GTA GAA TAC CTG GTT AAG GGT CTG CGT AAC AAA TTC CAA CTG AAC TGT TAC GTA AAA 3210 

173EVEYLVKGLRNKFQLKCYVK 192 

3211 ATC AAC AAA AAC AAA CCG ATC ATC TAC ATC GAT TCT ATG TCT TAC CTG ATC TTC TAC AAC 3270 

193INKNKPIIYIDSMSYLIFyN 212 

3271 CTG ATC AAA CCG TAC CTG ATC CCG CAG ATG ATG TAC AAA CTG CCG AAC ACT ATC TCC TCC 3330 

213LIKPyLIPQMMYKLPNTISS 232 

3331 GAA ACT TTC CTG AAA TAA 
233 E T F L K ♦ 



[0013] This invention also relates to a DNA sequence 
comprising a promoter operatively linked to the DNA 
sequence of the invention encoding the enzyme I-Scel. 

[0014] This invention further relates to an isolated RNA 
complementary to the DNA sequence of the invention 
encoding the enzyme I-Scel and to the other DNA sequences 
described herein. 

[0015] In another embodiment of the invention, a vector is 
provided. The vector comprises a plasmid, bacteriophage, or 
cosmid vector containing the DNA sequence of the inven- 
tion encoding the enzyme I-Scel. 

[0016] In addition, this invention relates to E. coli or 
eukaryotic cells transformed with a vector of the invention. 

[0017] Also, this invention relates to transgenic animals 
containing the DNA sequence encoding the enzyme 1-SceI 
and cell lines cultured from cells of the transgenic animals. 

[0018] In addition, this invention relates to a transgenic 
organism in which at least one restriction site for the enzyme 
I-Scel has been inserted in a chromosome of the organism. 

[0019] Further, this invention relates to a method of 
genetically mapping a eukaryotic genome using the enzyme 
I-Scel. 

[0020] This invention also relates to a method for in vivo 
site directed recombination in an organism using the enzyme 
I-Scel, 



BRIEF DESCRIPTION OF THE DRAWINGS 
[0021] This invention will be more fully described with 
reference to the drawings in which: 

[0022] FIG. 1 depicts the universal code equivalent of the 
mitochondrial I-Scel gene. 

[0023] FIG. 2 depicts the nucleotide sequence of the 
invention encoding the enzyme I-Scel and the amino acid 
sequence of the natural I-Scel enzyme. 

[0024] FIG. 3 depicts the I-Scel recognition sequence and 
indicates possible base mutations in the recognition site and 
the effect of such mutations on stringency of recognition. 

[0025] FIG. 4 is the nucleotide sequence and deduced 
amino acid sequence of a region of plasmid pSCM525. The 
nucleotide sequence of the invention encoding the enzyme 
I-Scel is enclosed in the box. 

[0026] FIG. 5 depicts variations around the amino acid 
sequence of the enzyme I-SoeL 

[0027] FIG. 6 shows Group I inlron encoding endonu- 
cleases and related endonucleases. 

[0028] FIG. 7 depicts yeast expression vectors containing 
the synthetic gene for I-Scel. 

[0029] FIG. 8 depicts the mammalian expression vector 
PRSV I-Scel. 

[0030] FIG. 9 is a restriction map of the plasmid pAFlOO. 
(See also YEAST, 6:521-534, 1990, which is relied upon and 
incorporated by reference herein). 
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[0048] Cations: Enzymatic activity requires Mg"^ (8 
mM is optimum). Mn"*"*" can replace Mg*"*", but this 
reduces the stringency of recognition. 

[0049] Optimum conditions for activity: high pH (9 
to 10), temperature 20-40° C, no monovalent cat- 
ions. 

[0050] Enzyme stability: I-Scel is unstable at room 
temperature. 'ITie enzyme-substrate complex is more 
stable than the enzyme alone (presence of recogni- 
tion sites stabilizes the enzyme.) 

[0051] The enzyme I-Scel has a known recognition site, 
(ref. 14.) The recognition site of I-Scel is a non-symmetrical 
sequence that extends over 18 bp as determined by system- 
atic mutational analysis. The sequence reads: (arrows indi- 
cate cuts) 




I 

TAACi 



TAGGGATAACAGGGTAAT 3' 
ATCCCTATTGTCCCATTA 5' 

t 

[0052] The recognition site corresponds, in part, to the 
upstream exon and, in part, to the downstream exon of the 
introD plus form of the gene. 

[0053] The recognition site is partially degenerate: single 
base substitutions within the 18 bp long sequence result in 
either complete insensitivity or reduced sensitivity to the 
enzyme, depending upon position and nature of the substi- 
tution, 

[0054] The stringency of recognition has been measured 



[0055] 1 — mutants of the site. 

[0056] 2 — the total yeast genome {Saccharomyces 
cerevisiae, genome complexity is 1.4x10^ bp). Data 
are unpublished. 

[0057] Results are: 

[0058] 1— Mutants of the site: As shown in FIG. 3, 
there is a general shifting of stringency, i.e., mutants 
severely affected in Mg"^ become partially affected 
in Mn"*^^, mutants partially affected in Mg"*^ become 
unaffected in Mn"^. 

[0059] 2 — ^Yeast: In magnesium conditions, no cleav- 
age is observed in normal yeast. In the same condi- 
tion, DNA from transgenic yeasts is cleaved to 
completion at the artificially inserted I -See! site and 
no other cleavage site can be detected. If magnesium 
is replaced by manganese, five additional cleavage 
sites are revealed in the entire yeast genome, none of 
which is cleaved to completion. Therefore, in man- 
ganese the enzyme reveals an average of 1 site for ca, 
3 millions based pairs (5/1.4x10^ bp). 

[0060] Definition of the recognition site: important bases 
are indicated in FIG. 3. lliey correspond lo bases for which 
severely affected mutants exist. Notice however that: 

[0061] 1 — All possible mutations at each position 
have not been determined; therefore a base that does 



not correspond to a severely affected mutant may 
still be important if another mutant was examined at 
this very same position. 

[0062] 2 — ^There is no clear-cut limit between a very 
important base (all mutants are severely affected) 
and a moderately important base (some of the 
mutants are severely affected). There is a continuum 
between excellent substrates and poor substrates for 
the enzyme. 

[0063] The expected frequency of natural I-Scel sites in a 
random DNA sequence is, therefore, equal to (0.25)"^* or 
(1.5x10"^^) In other words, one should expect one natural 
site for the equivalent of ca. 20 human genomes, but the 
frequency of degenerate sites is more difBcult to predict. 

[0064] I-Scel belongs to a "degenerate" subfamily of the 
two-dodecapeptide family. Conserved amino acids of the 
dodecapeptide motifs are required for activity. In particular, 
the aspartic residues at positions 9 of the two dodecapeptides 
cannot be replaced, even with glutamic residues. It is likely 
that the dodecapeptides form the catalytic site or pari of it. 

[0065] Consistent with the recognition site being non- 
symmetrical, it is likely that the endonucleolytic activity of 
I-Scel requires two successive recognition steps: binding of 
the enzyme lo the downstream half of the site (correspond- 
ing to the downstream exon) followed by binding of the 
enzyme to the upstream half of the site (corresponding to the 
upstream exon). The first binding is strong, the second is 
weaker, but the two are necessary for cleavage of DNA. In 
vitro, the enzyme can bind the downstream exon alone as 
well as the intron-exon junction sequence, but no cleavage 
results. 

[0066] The evolutionarily conserved dodecapeptide 
motifis of intron-encoded I-Scel are essential for endonu- 
clease activity. Il has been proposed that the role of these 
motifs is to properly position the acidic amino acids with 
respect to the DNA sequence recognition domains of the 
enzyme for the catalysis of phosphodiester bond hydrolysis 
(ref. P3), 

[0067] The nucleotide sequence of the invention, which 
encodes the natural I-Scel enzyme is shown in FIG. 2. The 
nucleotide sequence of the gene of the invention was derived 
by dideoxynucleotide sequencing. The base sequences of the 
nucleotides are written in the S'-^S' direction. Each of the 
letters shown is a conventional designation for the following 
nucleotides: 



A Adenine 
G Guanine 
T Thymine 
C Cytosine. 



[0068] It is preferred that the DNA sequence encoding the 
enzyme I-Scel be in a purified form. For instance, the 
sequence can be free of human blood -derived proteins, 
human serum proteins, viral proteins, nucleotide sequences 
encoding these proteins, human tissue, human tissue com- 
ponents, or combinations of these substances. In addition, il 
is preferred that the DNA sequence of the invention is free 
of extraneous proteins and hpids, and adventitious micro- 
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organisms, such as bacteria and viruses. The essentially 
purified and isolated DNA sequence encoding I-Scel is 
especially useful for preparing expression vectors. 

[0069] Plasmid pSCM525 is a pUC12 derivative, contain- 
ing an artificial sequence encoding the DNA sequence of the 
invention. The nucleotide sequence and deduced amino acid 
sequence of a region of plasmid pSCM525 is shown in FIG. 
4. The nucleotide sequence of the invention encoding I-Scel 
is enclosed in the box. The artificial gene is a BamHI — Sail 
piece of DNA sequence of 723 base pairs, chemically 
synthesized and assembled. It is placed under tac promoter 
control. The DNA sequence of the artificial gene differs from 
the natural coding sequence or its universal code equivalent 
described in Cell (1986), Vol. 44, pages 521-533. However, 
the translation product of the artificial gene is identical in 
sequence to the genuine omega-endonuclease except for the 
addition of a Met-His at the N-terminus. It will be under- 
stood that this modified endonuclease is within the scope of 
this invention. 

[0070] Plasmid pSCM525 can be used to transform any 
suitable £. coli strain and transformed cells become ampi- 
cillin-resistant. Synthesis of the omega-endonuclease is 
obtained by addition of I.P.T.G. or an equivalent inducer of 
the lactose operon system. 

[0071] A plasmid identified as pSCM525 containing the 
enzyme I-Scel was deposited in E. coU strain TGI with the 
Collection Nationale de Cultures de Microorganismes 
(C.N.C.M.) of Inslitut Pasteur in Paris, France on Nov. 22, 
1990, under cuhure collection deposit Accession No. 1-1014. 
The nucleotide sequence of the invention is thus available 
from this deposit. 

[0072] The gene of the invention can also be prepared by 
the formation of 3'-*5' phosphate linkages between nucleo- 
side units using conventional chemical synthesis techniques. 
For example, the well-known phosphodiester, phosphotri- 
ester, and phosphite iriesler techniques, as well as known 
modifications of these approaches, can be employed. Deox- 
yribonucleo tides can be prepared with automatic synthesis 
machines, such as those based on the phosphoramidite 
approach. Oligo- and polyribonucleotides can also be 
obtained with the aid of RNA ligase using conventional 
techniques. 

[0073] This invention of course includes variants of the 
DNA sequence of the invention exhibiting substantially the 
same properties as the sequence of the invention. By this it 
is meant that DNA sequences need not be identical to the 
sequence disclosed herein. Variations can be attributable to 
single or multiple base substitutions, deletions, or insertions 
or local mutations involving one or more nucleotides not 
substantially detracting from the properties of the DNA 
sequence as encoding an enzyme having the cleavage prop- 
erties of the enzyme I-Scel. 

[0074] FIG. 5 depicts some of the variations that can be 
made around the I-Scel amino acid sequence. It has been 
demonstrated that the following positions can be changed 
without affecting enzyme activity: 



positions -1 and -2 are not natural. The two amino 
adds are added due to cloning strategies. 



positions 1 to 10: 
position 36: 
position 40: 
position 41: 
position 43: 
position 46: 
position 91: 
positions 123 and 156 
position 223: 



can be deleted. 
G b tolerated. 
M or V are tolerated. 
S or N are tolerated. 
A is tolerated. 
V or N are tolerated. 
A is tolerated. 
L is tolerated. 
A and S are tolerated. 



[0075] It will be understood that enzymes containing these 
modifications are within the scope of this invention. 

[0076] Changes to the amino acid sequence in FIG. 5 that 
have been demonstrated to affect enzyme activity are as 
follows: 



position 19: 


Lto S 


position 38: 


I to S or N 


position 39: 


G to D or R 


position 40: 


Lto Q 


position 42: 


Lto R 


position 44: 


D to E, G or H 


position 45: 


A to E or D 


position 46: 


Yto D 


position 47: 


I to R or N 


position 80: 


Lto S 


position 144: 


Dto E 


position 145: 


Dto E 


position 146: 


Gto E 


position 147; 


G to S 



[0077] It will also be understood that the present invention 
is intended to encompass fragments of the DNA sequence of 
the invention in purified form, where the fragments are 
capable of encoding enzymatically active I-Scel. 

[0078] The DNA sequence of the invention coding for the 
enzyme I-Scel can be amplified in the well known poly- 
merase chain reaction (PGR), which is useful for amplifying 
all or specific regions of the gene. See e.g., S. Kwok et al., 
J. Virol, 61:1690-1694 (1987); U.S. Pat. Nos. 4,683,202; 
and 4,683,195. More particularly, DNA primer pairs of 
known sequence positioned 10-300 base pairs apart that are 
complementary to the plus and minus strands of the DNA to 
be amplified can be prepared by well known techniques for 
the synthesis of oligonucleotides. One end of each primer 
can be extended and modified to create restriction endonu- 
clease sites when the primer is annealed to the DNA. The 
PGR reaction mixture can contain the DNA, the DNA primer 
pairs, four deoxyribonucleoside triphosphates, MgClj, DNA 
polymerase, and conventional biiffers. The DNA can 
amplified for a number of cycles. It is generally ppssiWe to 
increase the sensitivity of detection by usipg-ff^iltiplicity 
of cycles, each cycle consisting ofj^^itSft period of dena- 
turation of the DNA at an elpsStefltemperature, cooling of 
the reaction mixture, and^lymerization with the DNA 
polymerase. Amplified sequences can be detected by the use 
of a technique termed oligomer restriction (OR). See, R. K. 
Saiki et al., Bio/Technology 3:1008-1012 (1985). 

[0079] The enzyme I-Scel is one of a number of endonu- 
cleases with similar properties. Following is a listing of 
related enzymes and Uieir sources. 
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[0080] Group I introo encoded endonucleases and related 
enzymes are listed below with references. Recognition sites 
are shown in FIG. 6. 



Enzyme 


Encoded by 


Rcf 


I-5ccI 


Sc LSU-1 intron 


this work 


I-SccU 


Sc coxl'-4 intron 


Carmtvil *t al NAD 
oaiEUCU CI Bi.| P(/Vlx 

(1990) 18, 5659-5665 


I-SceOI 


Sc coxl-3 intron 


Sargucil et al., MGG 
(1991) 225, 340-341 


I-SccIV 


Sc coxl-5a intxon 


Seraphin et al. (1992) 
in press 


I-Ccul 


Ce LSU-5 intron 


Marshall, Lemieux Gene 
(1991) 104, 241-245 


I-Crcl 


Cr LSU-1 intron 


Rochaix (unpublished) 


I-Ppol 


Pp LSU-3 intron 


Muscarelta et al., MCB 




(1990) 10, 3386-3396 


I-ltvI 


T4 td-1 intron 


Chu et al., PNAS (1990) 
87, 3574-3578 and Bell- 
Pedeiscn et al. NAR 
(1990) 18, 3763-3770. 


I-Tcvn 


T4 sunY intron 


Bell-Pedersen et al. NAR 
(1990) 18. 3763-3770. 


l-Tcvin 


RB3 nrdB-1 intron 


Eddy, Gold, Genes Dev. 
(1991) 5, 1032-1041 


HO 


HO yeast gene 


Nickoloff et al., MCB 
(1990) 10, 1174-1179 


Endo Scel 


RF3 yeast mito. gene 


Kawasaki et al., JBC 




(1991) 266, 5342-5347 



[0081] Imitative new enzymes (genetic evidence but no 
activity as yet) are I-CsmI from cytochrome b intron 1 of 
Chlamydomonas smithii mitochondria (ref. 15), I-PanI from 

N cytochrome b intron 3 olPodospora anserina mitochondria 
(Jill S alvo), and^£robably enzymes encoded by introns Nc 
pdl.j) and N c^obl) from Neurospora crassa. 

The I-endonucleases can t>e classified as follows: 

[0083] Class I: TWo dodecapeptide motifs, 4 bp stag- 
gered cut with 3' OH overhangs, cut internal to recog- 
nition site 



Subclass "I-SccI" 


Other subclasses 


I-Scel 


i-sccn 


I-ScelV 


I-Sceni 


I-CfemI 


I-Ceul (only one dodecapeptide motif) 


I-PanI 


I-Crel (only one dodecapeptide motif) 




HO 




TFFl-408 (HO homolog) 




Endo Scel 



[0084] Class II: GIY-(Nio.ii) YIG motif, 2 bp staggered 
cut with 3' OH overhangs, cut external to recognition 
site: 

[0085] I-TevI 

[0086] Class III: no typical structural motifs, 4 bp 
staggered cut with 3' OH overhangs, cut internal to 
recognition site: 

[0087] I-Ppol 

[0088] Class IV: no typical structural motifs, 2 bp 
staggered cut with 3' OH overhangs, cut external to 
recognition site: 
[0089] I-TevII 



[0090] Qass V: no typical structural motifs, 2 bp stag- 
gered cut with 5' OH overhangs: 

[0091] I-TevIU. 

[0092] 2. Nucleotide Probes Containing the I-Scel Gene of 
The Invention 

[0093] The DNA sequence of the invention coding for the 
enzyme I-Scel can also be used as a probe for the detection 
of a nucleotide sequence in a biological material, such as 
tissue or body fluids. The probe can be labeled with an atom 
or inorganic radical, most commonly using a radionuclide, 
but also perhaps with a heavy metal. Radioactive labels 
include ^^P, ^H, ^^C, or the like. Any radioactive label can 
be employed, which provides for an adequate signal and has 
sufBcient half-life. Other labels include ligands that can 
serve as a specific binding member to a labeled antibody, 
fluorescers, chemilimsinescers, enzymes, antibodies which 
can serve as a specific binding pair member for a labeled 
ligand, and the like. The choice of the label will be governed 
by the effect of the label on the rate of hybridization and 
binding of the probe to the DNA or RNA. It will be 
necessary that the label provide sufBcient sensitivity to 
detect the amount of DNA or RNA available for hybridiza- 
tion. 

[0094] When the nucleotide sequence of the invention is 
used as a probe for hybridizing to a gene, the nucleotide 
sequence is preferably affixed to a water insoluble solid, 
porous support, such as nitrocellulose paper. Hybridization 
can be carried out using labeled polynucleotides of the 
invention and conventional hybridization reagents. The par- 
ticular hybridization technique is not essential to the inven- 
tion. 

[0095] The amount of labeled probe present in the hybrid- 
ization solution will vary widely, depending upon the nature 
of the label, the amount of the labeled probe which can 
reasonably bind to the support, and the stringency of the 
hybridization. Generally, substantial excesses of the probe 
over stoichiometric will be employed to enhance the rate of 
binding of the probe to the fixed DNA. 

[0096] Various degrees of stringency of hybridization can 
be employed. The more severe the conditions, the greater the 
complementarity that is required for hybridization between 
the probe and the polynucleotide for duplex formation. 
Severity can be controlled by temperature, probe concen- 
tration, probe length, ionic strength, time, and the like. 
Conveniently, the stringency of hybridization is varied by 
changing the polarity of the reactant solution. Temperatures 
to be employed can be empirically determined or determined 
from well known formulas developed for this purpose. 

[0097] 3. Nucleotide Sequences Containing the Nucle- 
otide Sequence Encoding I-Scel 

[0098] This invention also relates to the DNA sequence of 
the invention encoding the enzyme I-Sccl, wherein the 
nucleotide sequence is linked to other nucleic acids. The 
nucleic acid can be obtained from any source, for example, 
from plasm ids, from cloned DNA or RNA, or from natural 
DNA or RNA from any source, including prokaryoiic and 
eukaryotic organisms. DNA or RNA can be extracted from 
a biological material, such as biological fluids or tissue, by 
a variety of techniques including those described by Mania- 
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lis et al., Molecular Cloning: A Laboratory Manual Cold 
Spring Harbor Laboratory, New York (1982). The nucleic 
acid will generally be obtained from I bacteria, yeast, vims, 
or a higher organism, such as a planl93r animal. The nucleic 
acid can be a fraction of a more complex mixture, such as 
a portion of a gene contained in whole himian DNA or a 
portion of a nucleic acid sequence of a particular microor- 
ganism. The nucleic acid can be a fraction of a larger 
molecule or the nucleic acid can constitute an entire gene or 
assembly of genes. The DNA can be in a single-stranded or 
double -stranded form. If the fragment is in single-stranded 
form, it can be converted to double-stranded form using 
DNA polymerase according to conventional techniques. 

[0099] The DNA sequence of the invention can be linked 
to a structural gene. As used herein, the term "structural 
gene" refers to a DNA sequence that encodes through its 
template or messenger mRNA a sequence of amino acids 
characteristic of a specific protein or polypeptide. The 
nucleotide sequence of the invention can function with an 
expression control sequence, that is, a DNA sequence that 
controls and regulates expression of the gene when opera- 
tively linked to the gene. 

[0100] 4. Vectors Containing the Nucleotide Sequence of 
the Invention 

[0101] This invention also relates to cloning and expres- 
sion vectors containing the DNA sequence of the invention 
coding for the enzyme I-Scel. 

[0102] More particularly, the DNA sequence encoding the 
enzyme can be ligated to a vehicle for cloning the sequence. 
The major steps involved in gene cloning comprise proce- 
dures for separating DNA containing the gene of interest 
from prokaryotes or eukaryotes, cutting the resulting DNA 
fragment and the DNA from a cloning vehicle at specific 
sites, mixing the two DNA fragments together, and ligating 
the fragments to yield a recombinant DNA molecule. The 
recombinant molecule can then be transferred into a host 
cell, and the cells allowed to replicate to produce identical 
cells containing clones of the original DNA sequence. 

[0103] The vehicle employed in this invention can be any 
double-stranded DNA molecule capable of transporting the 
nucleotide sequence of the invention into a host cell and 
capable of replicating within the cell. More particularly, the 
vehicle must contain at least one DNA sequence that can act 
as the origin of replication in the host cell. In addition, the 
vehicle must contain two or more sites for insertion of the 
DNA sequence encoding the gene of the invention. These 
sites will ordinarily correspond to restriction enzyme sites at 
which cohesive ends can be formed, and which are comple- 
mentary to the cohesive ends on the promoter sequence to be 
ligated to the vehicle. In general, this invention can be 
carried out with plasmid, bacteriophage, or cosmid vehicles 
having these characteristics. 

[0104] The nucleotide sequence of the invention can have 
cohesive ends compatible with any combination of sites in 
the vehicle. Alternatively, the sequence can have one or 
more blunt ends that can be ligated to corresponding blunt 
ends in the cloning sites of the vehicle. The nucleotide 
sequence to be ligated can be further processed, if desired, 
by successive exonuclease deletion, such as with the enzyme 
Bal 31. In the event that the nucleotide sequence of the 



invention does not contain a desired combination of cohe- 
sive ends, the sequence can be modified by adding a linker, 
an adaptor, or homopolymer tailing. 

[0105] It is preferred that plasmids used for cloning nucle- 
otide sequences of the invention carry one or more genes 
responsible for a useful characteristic, such as a selectable 
marker, displayed by the host cell. In a preferred strategy, 
plasmids having genes for resistance to two different drugs 
are chosen. For example, insertion of the DNA sequence into 
a gene for an antibiotic inactivates the gene and destroys 
dmg resistance. The second drug resistance gene is not 
affected when cells are transformed with the recombinants, 
and colonies containing the gene of interest can be selected 
by resistance to the second drug and susceptibility to the first 
dmg. Preferred antibiotic markers are genes imparting 
chloramphenicol, ampidllin, or tetracycline resistance to the 
host cell. 

[0106] A variety of restriction enzymes can be used to cut 
the vehicle. The identity of the restriction enzyme will 
generally depend upon the identity of the ends on the DNA 
sequence to be ligated and the restriction sites in the vehicle. 
The restriction enzyme is matched to the restriction sites in 
the vehicle, which in turn is matched to the ends on the 
nucleic acid fragment being ligated. 

[0107] The ligation reaction can be set up using well 
known techniques and conventional reagents. Ligation is 
carried out wiUi a DNA ligase that catalyzes the formation 
of phosphodiester bonds between adjacent 5'-phospbate and 
the free 3'-hydroxy groups in DNA duplexes. The DNA 
ligase can be derived from a variety of microorganisms. The 
preferred DNA ligases are enzymes from £. coli and bac- 
teriophage T4. T4 DNA ligase can ligate DNA fragments 
with blunt or sticky ends, such as those generated by 
restriction enzyme digestion. E. coli DNA ligase can be used 
to catalyze the formation of phosphodiester bonds between 
the termini of duplex DNA molecules containing cohesive 
ends. 

[0108] Cloning can be carried out in prokaryotic or 
eukaryotic cells. The host for replicating the cloning vehicle 
will of course be one that is compatible with the vehicle and 
in which the vehicle can replicate. When a plasmid is 
employed, the plasmid can be derived from bacteria or some 
other organism or the plasmid can be synthetically prepared. 
The plasmid can replicate independently of the host cell 
chromosome or an integrative plasmid (episome) can be 
employed. The plasmid can make use of the DNAreplicative 
enzymes of the host cell in order to replicate or the plasmid 
can carry genes that code for the enzymes required for 
plasmid replication. A number of different plasmids can be 
employed in practicing this invention. 

[0109] The DNA sequence of the invention encoding the 
enzyme I-Scel can also be ligated to a vehicle to form an 
expression vector. The vehicle employed in this case is one 
in which it is possible to express the gene operatively linked 
to a promoter in an appropriate host cell. It is preferable to 
employ a vehicle known for use in expressing genes in E. 
coli, yeast, or mammalian cells. These vehicles include, for 
example, the following E. coli expression vectors: 

[OHO] pSCM525, which is an £. coli expression vector 
derived from pUC12 by insertion of a tac promoter and 
the synthetic gene for I-Scel. Expression is induced by 
IPTG. 
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[0111] pGEXu)6, which is an coli expression vector 
derived from pGEX in which the synthetic gene from 
pSCM525 for I-Scel is fused with the glutathione S 
transferase gene, producing a hybrid protein. The 
hybrid protein possesses the endonuclease activity. 

[0112] pDIC73, which is an E. coU expression vector 
derived from pET-3Cby insertion of the synthetic gene 
for I-Scel (Ndel— BamHI fragment of pSCM525) 
under T7 promoter control. This vector is used in strain 
BL21 (DE3) which expresses the T7 RNA polymerase 
under IPTG induction. 

[0113] pSCM351, which is an coli expression vector 
derived from pUR291 in which the synthetic gene for 
1-SceI is fused with the Lac Z gene, producing a hybrid 
protein. 

[0114] pSCM353, which is an £. coli expression vector 
derived from PEXl in which the synthetic gene for 
I-Scel is fused with the Cro/Lac Z gene, producing a 
hybrid protein. 

[0115] Examples of yeast expression vectors are: 

[0116] pPEX7, which is a yeast expression vector 
derived from pRP51-Bam O (a LEU2d derivative of 
pLG-SD5) by insertion of the synthetic gene under the 
control of the galactose promoter. Expression is 
induced by galactose. 

[0117] pPEX408, which is a yeast expression vector 
derived from pLG-SD5 by insertion of the synthetic 
gene under the control of the galactose promoter. 
Expression is induced by galactose. 

[0118] Several yeast expression vectors are depicted in 
FIG. 7. 

[0119] T^iqal mammaban expression vectors arc: 

[012Cn ^RSV I-Scel, which is a pRSV derivative in 
whkAth/synthetic gene (BamHI — ^Pstl fragment from 
pSCMS25) is under the control of the LTR promoter of 
Rous Sarcoma Virus. This expression vector is depicted 
in FIG. 8. 

[0121] Vectors for expression in Chinese Hamster Ovary 
(CHO) cells can also be employed. 

[0122] 5. Cells Transformed with Vectors of the Invention 

[0123] The vectors of the invention can be inserted into 
host organisms using conventional techniques. For example, 
the vectors can be inserted by transformation, transfection, 
electroporation, microinjection, or by means of liposomes 
(lipofection). 

[0124] Cloning can be carried out in prokaryotic or 
eukaryotic cells. The host for replicating the cloning vehicle 
will of course be one that is compatible with the vehicle and 
in which the vehicle can replicate. Cloning is preferably 
carried out in bacterial or yeast cells, although ceUs of 
fungal, animal, and plant origin can also be employed. The 
preferred host cells for conducting cloning work are bacte- 
rial cells, such as E. coli. The use of E. coli cells is 
particularly preferred because most cloning vehicles, such as 
bacterial plasmids and bacteriophages, replicate in these 
cells. 



[0125] In a preferred embodiment of this invention, an 
expression vector containing the DNA sequence encoding 
the nucleotide sequence of the invention operatively linked 
to a promoter is inserted into a mammalian cell using 
conventional techniques. 

Application of I-Scel for Large Scale Mapping 

[0126] 1 . Occurrence of Natural Sites in Various Genomes 

[0127] Using the purified I-Scel enzyme, the occurrence 
of natural or degenerate sites has been examined on the 
complete genomes of several species. No natural site was 
found in Saccharomyces cerevisiae. Bacillus anihracis. Bar- 
relia burgdorferi, Leptospira biftexa andL. interrogans. One 
degenerate site was found on T7 phage DNA. 

[0128] 2. Insertion of Artificial Sites 

[0129] Given the absence of natural I-Scel sites, artificial 
sites can be introduced by transformation or transfection. 
Two cases need to be distinguished: site-directed integration 
by homologous recombination and random integration by 
non-homologous recombination, transposon movement or 
retroviral infection. The first is easy in the case of yeast and 
a few bacterial species, more difficult for higher eucaryotes. 
The second is possible in all systems. 

[0130] 3. Insertion Vectors 

[0131] Two types can be distinguished: 

[0132] 1 — Site specific cassettes that inU-oduce the 
I-Scel site together with a selectable marker. 

[0133] For yeast: all are pAFlOO derivatives (Thierry et al. 
(1990) YEAST 6:521-534) containing the following marker 
genes: 



pAFlO] : 


URA3 (inserted in the HindlTI site) 


pAF103: 


Neo* (inserted in Bgin site) 


pAn04: 


HIS3 (inserted in Bglll site) 


pAFlOS: 


Kan^ (inserted in BglU site) 


pAF106: 


Kan* (inserted in BglU site) 


pAF107: 


LYS2 (inserted between HindlU and EcoR V) 



[0134] A restriction map of the plasmid pAFlOO is shown 
in FIG. 9. The nucleotide sequence and restriction sites of 
regions of plasmid pAFlOO are shown in FIGS. lOA and 
lOB. Many transgenic yeast strains with the I-Scel site at 
various and known places along chromosomes are available. 

[0135] 2 — Vectors derived from transposable ele- 
ments or retroviruses. 

[0136] For E. coli and other bacteria: mini Tn5 derivatives 
containing the I-Scel site and 

[0137] pTSmwStr'^ 

[0138] pTKm o) Kan^ (See FIG. 11) 

[0139] pTTccuTet^ 

[0140] For yeast: pTya)6 is a pD123 derivative in which 
the I-Scel site has been inserted in the LTR of the Ty 
element. (FIG. 12) 
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[0141] For mammalian cells: 

[0142] pMLV LTR SAPLZ: containing the I-Scel site in 
the LTR of MLV and Phleo-LacZ (FIG. 13). This vector is 
first grown in opl cells (3T3 derivative, from R. Mulligan). 
Two transgenic cell lines with the I-Scel site at undeter- 
mined locations in the genome are available: 1009 (pluri- 
potent nerve cells, J. F. Nicolas) and D3 (ES cells able to 
generate transgenic animals). 

[0143] 4. The Nested Chromosomal Fragmentation Strat- 
egy 

[0144] The nested chromosomal fragmentation strategy 
for genetically mapping a cukaryotic genome exploits the 
unique properties of the restriction endonuclease I-Scel, 
such as an 18 bp long recognition site. The absence of 
natural I-Scel recognition sites in most eukaryotic genomes 
is also exploited in this mapping strategy. 

[0145] First, one or more I-Scel recognition sites are 
artificially inserted at various positions in a genome, by 
homologous recombination using specific cassettes contain- 
ing selectable markers or by random insertion, as discussed 
supra. The genome of the resulting transgenic strain is then 
cleaved completely at the artificially inserted I-Scel site(s) 
upon incubation with the I-Scel restriction enzyme. The 
cleavage produces nested chromosomal fragments. 

[0146] The chromosomal fragments are then purified and 
separated by pulsed field gel (PFG) electrophoresis, allow- 
ing one to "map" the position of the inserted site in the 
chromosome. If total DNA is cleaved with the restriction 
enzyme, each artificially introduced I-Scel site provides a 
unique "molecular milestone" in the genome. Thus, a set of 
transgenic strains, each carrying a single I-Scel site, can be 
created which defines physical genomic intervals between 
the milestones. Consequently, an entire genome, a chromo- 
some or any segment of interest can be mapped using 
artificially introduced 1-Scel restriction sites. 

[0147] The nested chromosomal fragments may be trans- 
ferred to a solid membrane and hybridized to a labelled 
probe containing DNA complementary to the DNA of the 
fragments. Based on the hybridization banding patterns that 
are observed, the eukaryotic genome may be mapped. The 
set of transgenic strains with appropriate "milestones" is 
used as a reference to map any new gene or clone by direct 
hybridization. 

EXAMPLE 1 

[0148] Application of the Nested Chromosomal Fragmen- 
tation Strategy to the Mapping of Yeast Chromosome XI 

[0149] This strategy has been applied to the mapping of 
yeast chromosome XI of Saccharamyces cerevisiae. The 
I-Scel site was inserted at 7 different locations along chro- 
mosome XI of the diploid strain FY1679, hence defining 
eight physical intervals in that chromosome. Sites were 
inserted from a URA3-1-I-Scel cassette by homologous 
recombination. Two sites were inserted within genetically 
defined genes, TIFl and FASl, the others were inserted at 
unknown positions in the chromosome from five non-over- 
lapping cosmids of our library, taken at random. Agarose 
embedded DNA of each of the seven transgenic strains was 
then digested with I-Scel and analyzed by pulsed field gel 
electrophoresis (FIG. 14A). The position of the I-Scel site 



of each transgenic strain in chromosome XI is first deduced 
from the fragment sizes without consideration of the left/ 
right orientation of the fragments. Orientation was deter- 
mined as follows. The most telomere proximal I-Scel site 
from this set of strains is in the transgenic E40 because the 
50 kb fragment is the shortest of all fragments (FIG. 15A). 
Therefore, the cosmid clone pUKGO40, which was used to 
insert the I-Scel site in the transgenic E40, is now used as a 
probe against all chromosome fragments (FIG. 14B). As 
expected, pUKG040 lights up the two fragments from strain 
E40 (50 kb and 630 kb, respectively). The large fragment is 
close to the entire chromosome XI and shows a weak 
hybridization signal due to the fact that the insert of 
pUKG040, which is 38 kb long, contains less than 4 kb 
within the large chromosome fragment. Note that the entire 
chromosome XI remains visible after I-Scel digestion, due 
to the fact that the transgenic strains are diploids in which 
the I-Scel site is inserted in only one of the two homologs. 
Now, the pUKG040 probe hybridizes to only one fragment 
of all other transgenic strains allowing unambiguoiis left/ 
right orientation of I-Scel sites (See FIG. 15B). No signifi- 
cant cross hybridization between the cosmid vector and the 
chromosome subfragment containing the I-Scel site inser- 
tion vector is visible. Transgenic strains can now be ordered 
such that I-Scel sites are located at increasing distances from 
the hybridizing end of the chromosome (FIG. 15C) and the 
I-Soel map can be deduced (FIG. 150). Precision of the 
mapping depends upon PFGE resolution and optimal cali- 
bration. Note that actual left/right orientation of the chro- 
mosome with respect to the genetic map is not known at this 
step. To help visualize our strategy and to obtain more 
precise measurements of the interval sizes between I-Scel 
sites between I-Scel, a new pulsed field gel electrophoresis 
with the same transgenic strains now placed in order was 
made (FIG. 16). After transfer, the fragments were hybrid- 
ized successively with cosmids pUKG040 and pUKG066 
which light up, respectively, all fragments from the opposite 
ends of the chromosome (clone pUKG066 defines the right 
end of the chromosome as defined from the genetic map 
because it contains the SIRl gene. A regular stepwise 
progression of chromosome fragment sizes is observed. 
Note some cross hybridization between the probe 
pUKQQb6 and chromosome III, probably due to some 
repetitive DNA sequences. 

[0150] All chromosome fragments, taken together, now 
define physical intervals as indicated in FIG. ISd. The 
I-Scel map obtained has an 80 kb average resolution. 

EXAMPLE 2 

[0151] Application of the Nested Chromosomal Fragmen- 
tation Strategy to the Mapping of Yeast Artificial Chromo- 
some (YAC) Clones 

[0152] This strategy can be applied to YAC mapping with 
two possibilities. 

[0153] 1 — insertion of the 1-SceI site within the gene 
of interest using homologous recombination in yeast. 
This permits mapping of that gene in the YAC insert 
by I-Scel digestion in vitro. This has been done and 
works. 

[0154] 2 — random integration of I-Scel sites along 
the YAC insert by homologous recombination in 
yeast using highly repetitive sequences (e.g., B2 in 
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mouse or Alu in human). Transgenic strains are then 
used as described in ref. PI to sort libraries or map 
genes. 

[0155] The procedure has now been extended to YAC 
containing 450 kb of Mouse DNA. To this end, a repeated 
sequence of mouse DNA (called B2) has been inserted in a 
plasmid containing the I-Scel site and a selectable yeast 
marker (LYS2). Transformation of the yeast cells containing 
the recombinant YAC with the plasmid linearized within the 
B2 sequence resulted in the integration of the I -See! site at 
five dififerent locations distributed along the mouse DNA 
insert. Cleavage at the inserted I-Scel sites using the enzyme 
has been successful, producing nested fragments that can be 
purified after electrophoresis. Subsequent steps of the pro- 
tocol exactly parallels the procedure described in Example 1. 

EXAMPLE 3 

[0156] Application of Nested Chromosomal Fragments to 
the Direct Sorting of Cosmid Libraries 

[0157] The nested, chromosomal fragments can be puri- 
fied from preparative PFG and used as probes against clones 
from a chromosome XI specific sublibrary. This sublibrary 
is composed of 138 cosmid clones (corresponding to eight 
times coverage) which have been previously sorted from our 
complete yeast genomic libraries by colony hybridization 
with PFG purified chromosome XI. This collection of 
unordered clones has been sequentially hybridized with 
chromosome fragments taken in order of increasing sizes 
from the left end of the chromosome. Localization of each 
cosmid clone on the I-Scel map could be unambiguously 
determined from such hybridizations. To further verify the 
resiilts and to provide a more precise map, a subset of all 
cosmid clones, now placed in order, have been digested with 
EcoRI, electrophoresed and hybridized with the nested 
series of chromosome fragments in order of increasing sizes 
from the left end of the chromosome. Results are given in 
FIG. 17. 

[0158] For a given probe, two cases can be distinguished: 
cosmid clones in which all EcoRI fragments hybridize with 
the probe and cosmid clones in which only some of the 
EcoRI fragments hybridize (i.e., compare pEKGlOO to 
pEKG098 in FIG. 17b), The first category corresponds to 
clones in which the insert is entirely included in one of the 
two chromosome fragments, the second to clones in which 
the insert overlaps an I-Scel site. Note that, for clones of the 
pEKG series, the EcoRI fragment of 8 kb is entirely com- 
posed of vector sequences (pWElS) that do not hybridize 
with the chromosome fragments. In the case where the 
chromosome fragment possesses the integration vector, a 
weak cross hybridization with the cosmid is observed (FIG. 
lie), 

[0159] Examination of FIG. 17 shows that the cosmid 
clones can unambiguously be ordered with respect to the 
I-Scel map (FIG. 13E), each clone falling either in a defined 
interval or across an I-Scel site. In addition, clones from the 
second category allow us to place some EcoRI fragments on 
the I-Scel maps, while others remain unordered. The com- 
plete set of chromosome XI -specific cosmid clones, cover- 
ing altogether eight times the equivalent of the chromosome, 
has been sorted with respect to the I-Scel map, as shown in 
FIG. 18. 



[0160] 5. Partial Restriction Mapping Using I-Scel 

[0161] In this embodiment, complete digestion of the 
DNA at the artificially inserted I-Scel site is followed by 
partial digestion with bacterial restriction endonucleases of 
choice. The restriction fragments are then separated by 
electrophoresis and blotted. Indirect end labelling is accom- 
plished using left or right I-Sce half sites. This technique has 
been successfiil with yeast chromosomes and should be 
applicable without difficulty for YAC. 

[0162] Partial restriction mapping has been done on yeast 
DNA and on mammalian cell DNA using the commercial 
enzyme I-Sccl. DNA from cells containing an artificially 
inserted I-Scel site is first cleaved to completion by I-Scel. 
The DNA is then treated under partial cleavage conditions 
with bacterial restriction endonucleases of interest (e.g., 
BamHI) and electrophoresed along with size calibration 
markers. The DNA is transferred to a membrane and hybrid- 
ized successively using the short sequences flanking the 
I-Scel sites on either side (these sequences are known 
because they are part of the original insertion vector that was 
used to introduce the I-Scel site). Autoradiography (or other 
equivalent detection system using non radioactive probes) 
permit the visualization of ladders, which directly represent 
the succession of the bacterial restriction endonuclease sites 
from the I-Scel site. The size of each band of the ladder is 
used to calculate the physical distance between the succes- 
sive bacterial restriction endonuclease sites. 

implication of I-Scel for In Vivo Site Directed 
Recombination 

[0163] 1. Expression of I-Scel in Yeast 

[0164] The synthetic I-Scel gene has been placed under 
the control of a galactose inducible promoter on multicopy 
plasmids pPEX? and pPEX408. Expression is correct and 
induces effects on site as indicated below. A transgenic yeast 
with the I-Scel synthetic gene inserted in a chromosome 
under the control of an inducible promoter can be con- 
structed. 

[0165] 2. Effects of Site Specific Double Strand Breaks in 
Yeast (Refe. 18 and P4) 

[0166] Effects on plasmid-borne I-Scel sites: 

[0167] Intramolecular effects are described in detail in 
Ref. 18. Intermolecular (plasmid to chromosome) recombi- 
nation can be predicted. ^ 

Effects on Chromosome Integrated I-Scel Sites 

[0168] In a haploid cell,^ single break within a chromo- 
some at an artificial I-Scel site results in cell division arrest 
followed by death (only a few % of survival). Presence of an 
intact sequence homologous to the cut site results in repair 
and 100% cell survival. In a diploid cell, a single break 
within a chromosome at an artificial I-Scel site results in 
repair using the chromosome homolog and 100% cell sur- 
vival. In both cases, repair of the induced double strand 
break results in loss of heterozygosity with deletion of the 
non homologous sequences flanking the cut and insertion of 
the non homologous sequences from the donor DNA mol- 
ecule. 
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-continued 

CGATTTTTGT GATGCTCGTC AGGGGGGCGG AGCCTATGGA AAAACGCCAG CAACGCGGCC 180 

TTTTTACGGT TCCTGGCCTT TTGCTGGCCT TTTGCTCACA TGTTCTTTCC TGCGTTATCC 240 

CCTGATTCTG TGGATAACCG TATTACC6CC TTTGAGTGAG CTGATACCGC TCGCCGCAGC 300 

CGAACGACCG AGCGCAGCGA GTCAGTGAGC GAGGAAGCGG AAGAGCGCCC AATACGCAAA 360 

CCGCCTCTCC CCGCGCGTTG GCCGATTCAT TAATGCAGCT GGCACGACAG GTTTCCCGAC 420 

TGGAAAGCGG GCAGTGAGCG CAACGCAATT AATGTGAGTT AGCTCACTCA TTAGGCACCC 480 

CAGGCTTTAC ACTTTATGCT TCCGGCTCGT ATGTTGTGTG GAATTGTGAG CGGATAACAA 540 

TTTCACACAG GAAACAGCTA TGACCATGAT TACGAATTCT CATGTTTGAC AGCTTATCAT 600 

CGATAAGCTT TAATGCGGTA GTTTATCACA GTTAAATTGC TAACGCAGTC AGGCACCGTG 660 

TATGAAATCT AACAATGCGC TCATCGTCAT CCTCGGCACC GTCACCCTGG ATGCTGTAGG 720 

CATAGGCTTG GTTATGCCGG TACTGCCGGG CCTCTTGCGG GATATCCGCC TGATGCGTGA 780 

ACGTGACGGA CGTAACCACC GCGACATGTG TGTGCTGTTC CGCTGGGCAT GCCAGGACAA 840 

CTTCTGGTCC GGTAACGTGC TGAGCCCGGC CAAGCTTACT CCCCATCCCC CTGTTGACAA 900 

TTAATCATCG GCTCGTATAA TGTGTGGAAT TGTGAGCGGA TAACAATTTC ACACAGGAAA 960 

CAGGATCCAT GCATATGAAA AACATCAAAA AAAACCAGGT AATGAACCTG GGTCCGAACT 1020 

CTAAACTGCT GAAAGAATAC AAATCCCAGC TGATCGAACT GAACATCGAA CAGTTCGAAG 1080 

CAGGTATCGG TCTGATCCTG GGTGATGCTT ACATCCGTTC TCGTGATGAA GGTAAAACCT 114 0 

ACTGTATGCA GTTCGAGTGG AAAAACAAAG CATACATGGA CCACGTATGT CTGCTGTACG 1200 

ATCAGTGGGT ACTGTCCCCG CCGCACAAAA AAGAACGTGT TAACCACCTG GGTAACCTGG 1260 

TAATCACCTG GGGCGCCCAG ACTTTCAAAC ACCAAGCTTT CAACAAACTG GCTAACCTGT 1320 

TCATCGTTAA CAACAAAAAA ACCATCCCGA ACAACCTGGT TGAAAACTAC CTGACCCCGA 138 0 

TGTCTCTGGC ATACTGGTTC ATGGATGATG GTGGTAAATG GGATTACAAC AAAAACTCTA 1440 

CCAACAAATC GATCGTACTG AACACCCAGT CTTTCACTTT CGAAGAAGTA GAATACCTGG 1500 

TTAAGGGTCT GCGTAACAAA TTCCAACTGA ACTGTTACGT AAAAATCAAC AAAAACAAAG 1560 
CGATCATCTA CATCGATTCT ATGTCTTACC TGATCTTCTA CAACCTGATC AAACCGTj£^f"*^1620 

TCATCCCCCA GATGATGTAC AAACTGCCGA ACACTATCTC CTCCGAAACT TTCCTGAAAT 1680 

AATAAGTCGA CCTGCAGCCC AAGCTTGGCA CTGGCCGTCG TTTTACAACG TCGTGACT 1738 



(2) INFORMATION FOR SEQ ID HO: 10; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 euoino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 10; 

Met Leu Val Arg Gly Ala Glu Pro Met Glu Lys Arg Gin Gin Arg Gly 
15 10 15 

Leu Phe Thr Val Pro Gly Leu Leu Leu Ala Phe Cys Ser Hie Val Leu 
20 25 30 

Ser Cys Val lie Pro 
35 
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INFORMATION FOR SEQ ID NO; XI: 



Met 
1 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; peptide 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Gin Leu Ala Arg Gin Val Ser Arg Leu Glu Ser Gly Gin 
5 10 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 12: 

Met Leu Pro Ala Arg Met Leu Cys Gly lie Val Ser Gly 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Thr Met He Thr Asn Ser His Val 
1 5 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO; 14: 

Met Lys Ser Asn Asn Ala Leu He Val He Leu Gly Thr Val Thr Leu 
15 10 15 

Asp Ala Val Gly He Gly Leu Val Met Pro Val Leu Pro Gly Leu Leu 
20 25 30 



Arg Asp He Arg Leu Met Arg Glu Arg Asp Gly Arg Asn His Arg Asp 
35 40 45 

Met Cys Val Leu Phe Arg Trp Ala Cys Gin Asp Asn Phe Trp Ser Gly 
50 55 60 

Asn Val Leu Ser Pro Ala Lys Leu Thr Pro His Pro Pro Val Asp Asn 
65 70 75 80 



(2) INFORMATION FOR SEQ ID NO: 15: 
(i) SEQUENCE CHARACTERISTICS: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: IB base pairs 

(B) TYPE! nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
ATCCCTATTG TCCCATTA 18 



What is claimed is: 

1, An isolated DNA encoding the enzyme I-Scel, wherein 
the DNA has the nucleotide squencc: 

/ ATG 
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AAA 


AAA 
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2671 AAC 


CTC 


GGT 


CCG 


AAC 


TCT 


AAA 


CTG 


CTG 


AAA 


GAA 


TAC 


AAA 


TCC 


CAG 


CTG 


ATC 


GAA 


CTG 


AAC 


2730 


13 N 
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K 
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L 
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L 


I 
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32 


2731 ATC 


GAA 


CAG 
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GAA 


GCA 


GGT 


ATC 


GGT 


CTG 


ATC 


CTG 


GGT 


GAT 
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ATC 


CGT 


TCT 


CGT 


2790 
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52 


27 91 GAT 
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AAA 
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TGT 


ATG 


CAG 


TTC 


GAG 


TGG 


AAA 


AAC 


AAA 


GCA 


TAC 


ATG 


GAC 


CAC 


2850 
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D 
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2851 GTA 


TGT 
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CTG 
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GAT 
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AAA 


GAA 


CGT 


GTT 
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CAC 


CAA 
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TTC 


AAC 


2970 
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T 
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G 


A 


Q 


T 


P 


K 


H 


Q 


A 


F 1 




112 



N 



SV\cu\A be 



on next atiache^ paqe. 




3031 AAC TAC CTG ACC CCG ATG TCT CTG GCA XAC TGG TTC ATG GAT GAT GGT GGT AAA TGG GAT 
133N YLTPMSLA (7^ WFMDDGGKWD 



30<?0 



3091 TAC AAC AAA AAC TCT ACC AAC AAA TCG ATC GTA CTG AAC ACC CAG TCT TTC ACT TTC GAA 
153YNKNSTNKS IVLNTQSFTPE 



3151 GAA GTA GAA TAC CTG GTT AAG GGT CTG CGT 
173 E V EYLVKGLR 



:GT AAC 



iC AAA TTC CAA CTG AAC TGT TAC GTA AAA 
KFQLNCYVK 



3211 ATC AAC AAA AAC AAA CCG ATC ATC TAC ATC GAT TCT ATG TCT TAC CTG ATC TTC TAC AAC 
193INKNKPIiyiDSMSYLIFYN 



3150 
172 



3210 
192 



3270 
212 



53 



What is claimed is: 

1. An isolated DNA encoding the enzyme I-5ceI, wherein 
the DNA has the nucleotide scpaence: 



ATC CAT XTC AAA AAC ATC AAA AAA AAC CAG G7A A7C 2679 
MHMKMXKKM0VM12 

2671 AAC CTC GGT CCC AAC TCT AAA CTC CTC AAA GAA TAC AAA TCC CAG C7G ATC GAA CTG AAC 2730 
13NLGPNSKLLKEYKSQLICLS 32 

2731 ATC GAA CAG TTC GAA GCA GGT ATC GGT CTG ATC CTG GGT GAT GCT TAC ATC CGT TCT CGT 279-3 
33ISQFSAGIGLXLGDAY IRSRS2 

2791 GAT GAA GGT AAA ACC TAC TGT ATG CAG TTC GAG TGC AAA AAC AAA GCA TAC ATG GAC CAC 2 9SC 
530EGKTYCMQrEWKNKAYMD 3. 72 

2851 GTA TGT CTG CTG TAC GAT CAG TGG GTA CTG TCC CCC CCG CAC AAA AAA GAA CGT GTT AAC 2910 
73VCLL YDQWVLSPPHKKERVM92 

2911 CAC CTG GGT AAC CTG GTA ATC ACC TGG GGC GCC CAC ACT TTC AAA CAC CAA GCT TTC AAC 2970 
93 B L G H L V Z T HG A Q T P K B Q A P H,112 



2971 AAA CTG GCT AAC CTG TTC ATC GTT AAC AAC AAA AAA ACC ATC CCG AAC AAC CTG GTT GAA 3030 
113 KLAHLPIVNHKK T IPHMLVE 132^ 



3031 AAC TAC CTG ACC CCC ATG TCT CTG CCA TAC TCC TTC ATC GAT GAT CGT CGT AAA TGG CAT 3090 
133 NYLTPMSLAYWPMDDCCKWD 152 

3091 TAC AAC AAA AAC TCT ACC AAC AAA TCG ATC GTA CTC AAC ACC CAG TCT TTC ACT TTC GAA 3150 
153 YNKHSTNKSIVLNTQSPTrE 172 

3151 GAA GTA GAA TAC CTG GTT AAC GGT CTG CGT AAC AAA TTC CAA CTC AAC TGT TAC GTA AAA 3210 
173EVETLVKGLRHKFQLNCYVK 192 

3211 ATC AAC AAA AAC AAA CCC ATC ATC TAC ATC CAT TCT ATC TCT TAC CTC ATC TTC TAC AAC 3270 
193INKHKPIITIDSMSYLIPYH212 

3271 CTC ATC AAA CCC TAC CTG ATC CCC CAC ATC ATC TAC AAA CTC CCC AAC ACT ATC TCC TCC 3330 
213 LIKPTLIPQMKTKLPNTISS232 

3331 GAA ACT TTC CTG AAA TAA 
233 E T r L K • 



2. DNA comprising the nucleotide sequence as claimed in 

claim 1 operatively linked to a promoter. 

-.AW c cca 

Fambont. Garrett 3. An isolated RNA sequence complementary to the 

6 DUSSER 
tot -oe *zzo 



nucleotide sequence of claim 1. 
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3271 CTG ATC AAA CCG TAG CTG ATC CCG CAG ATG ATG TAC AAA CTG CCG AAC ACT ATC TCC TCC 3330 
213LIKPYLIPQMMyKLPNTISS 232 

3331 GAA ACT TTC CTG AAA TAA 
233 E T F L K * 



2. DNA comprising the nucleotide sequence as claimed in 
claim 1 operatively linked to a promoter. 

3. An isolated RNA sequence complementary to the 
nucleotide sequence of claim 1 . 

4. RNA complementary to the nucleotide sequence of 
claim 2. 

5. A vehicle comprising a vector containing the nucleotide 
sequence as claimed in claim 1. 

6. The vehicle as claimed in claim 5, wherein the vector 
is an SV-40 vector. 

7. The vehicle as claimed in claim 5, wherein the vector 
is plasmid pSVOAL. 

8. The vehicle as claimed in claim 5 having the identifying 
characteristics of the vector having culture collection acces- 
sion number C.N.C.M. 1-1014. 

9. The vehicle as claimed in claim 5, wherein the vector 
is an expression vector 

10. A method of genetically mapping a eukaryotic 
genome that does not contain a natiu'al restriction site for 
I -Seel , comprising the steps of: 

(a) artificially inserting one or more I-Scel sites al various 
positions in the genome; 

(b) completely cleaving said genome at the inserted I-Scel 
sites, with the restriction enzyme I-Scel, to produce 
nested chromosomal fragments; 

(c) piu-ifying said fragments of step (b) by pulsed field gel 
electrophoresis (PFG); 

(d) transferring the fragments to a solid membrane; 

(e) hybridizing the fragments bound to said membrane to 
a labelled probe containing DNA complementary to 
said fragments; 

(f) detecting the hybridization banding patterns; and 

(g) mapping said eukaryotic genome based on the hybrid- 
ization banding patterns observed in step (f). 

U. The method of claim 10, wherein said eukaryotic 
genome is the yeast genome. 

12. The method of claim 10, wherein said eukaryotic 
genome is the genome of the yeast artificial chromosome 
vector (YAC). 

13. The method of claim 10, wherein said step of artifi- 
cially inserting one or more l-Sce] sites comprises random 
insertion. 

14. The method of c laina 10, wherein said step of artifi- 
cially inserting one o? feor^ I-SceI sites comprises homolo- 
gous recombination. 

15. The method of claim 11, wherein the probe of step (e) 
is derived from a cosmid clone, pUKG040. 



16. The method of claim 11, wherein the probe of step (e) 
is derived from a cosmid clone, pUKG066. 

17. The method of claim 10, wherein the nested chromo- 
somal fragments of step (c) are used as hybridization probes 
to sort cosmid h*braries. 

18. The method of claim 10, wherein after step (b), the 
genome is partially digested with bacterial restriction 
enzymes of choice and then electrophoresed, as in step (c), 
with size calibration markers. 

19. A method for in vivo site directed genetic recombi- 
nation in an organism using enzyme I-Scel, comprising the 
steps of: 

(a) introducing a synthetic gene encoding the I-Scel 
endonuclease into an expression vector; 

(b) inserting a I-Scel restriction site next to or within a 
gene of interest carried on a plasmid; 

(c) co-transforming the ceUs of said organism with said 
expression vector of step (a) and said plasmid of step 
(b), whereby said gene of interest, carried by said 
plasmid of step (b), is inserted into a chromosome of 
said organism at a specific site. 

20. The method of claim 19, wherein said organism is 
yeast. 

21. The method of claim 19, wherein said organism is 
bacteria. 

22. The method of claim 19, wherein said organism is 
mouse. 

23. The method of claim 19, wherein said synthetic gene 
of step (a) is under the control of a galactose inducible 
promoter, 

24. The method of claim 23, wherein said expression 
vector is plasmid pPEX408. 

25. The method of claim 23, wherein said expression 
vector is plasmid pPEX7. 

26. A method of genetically mapping a genome that does 
not contain a natural restriction site for I-Scel, comprising 
the steps of: 

(a) artificially inserting one or more I-Scel sites at various 
positions in the genome; 

(b) completely cleaving said genome at the inserted I-Scel 
sites, with the restriction enzyme I-Scel, to produce 
nested chromosomal fragments; 

(c) purifying said fragments of step (b); and 

(d) mapping said eukaryotic genome by detecting said 
fragments. 

♦ * ♦ ♦ ♦ 
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ABSTRACT 

An isolated DNA encoding the enzyme I-5ceI is provided. 
The DNA sequence can be incorporated in cloning and 
expression vectors, transformed cell lines and transgenic 
animals. The vectors are useful in gene mapping and site- 
directed insertion of genes • 
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methods for replacing a natural gene with another gene that 
is capable of alleviating the disease or genetic disorder. 



SUMMARY OF THE INVENTION 
Accordingly, this invention aids in fulfilling these 
needs in the art. Specifically, this invention relates to an 
isolated DNA encoding the enzyme I-5ceI. The DNA has the 
following nucleotide sequence: 
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3151 GAA CTA GAA TAC CTC CTT AAC GCT CTC CCT AAC AAA TTC CAA CTC AAC TCT TAC GTA AAA 3210 

173EVEYLVKCLRNK FQLNC YVK 192 

3211 ATC AAC AAA AAC AAA CCC ATC ATC TAC ATC GAT TCT ATC TCT TAC CTC ATC TTC TAC AAC 32 70 

193INKHKP I lYIDSMSYLI FYN 212 



32 71 CTC ATC AAA CCG TAC CTC ATC CCC CAC ATC ATC TAC AAA CTC CCG AAC ACT ATC TCC TCC 3 3 30 
213 LIKPYLIPQNMYKLPRTISS 232 



3331 GAA ACT TTC CTC AAA TAA 
233 E T r L K • 
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I-Scel is a double- stranded endonuclease that cleaves 
DNA within its recognition site. I-Scel generates a 4bp 
staggered cut with 3 'OH overhangs. 

Substrate: Acts only on double -stranded DNA. Substrate 
DNA can be relaxed or negatively supercoiled. 

Cations: Enzymatic activity requires Mg"*""*" (8 mM is 
optimum) . Mn"*""^ can replace Mg"*""^, but this reduces the 
stringency of recognition. 

Optimum conditions for activity: high pH (9 to 10) , 
temperature 20-40°C, no monovalent cations. 

Enzyme stability: I-5ceI is unstable at room tempera- 
ture. The enzyme -substrate complex is more stable than 
the enzyme alone (presence of recognition sites stabi- 
lizes the enzyme.) 

The enzyme I-iScel has a known recognition site. (ref . 
14.) The recognition site of I-Scel is a non- symmetrical 
sequence that extends over 18 bp as determined by systematic 
mutational analysis. The sequence reads: (arrows indicate 
cuts) 



.•.i5--NOTON. ZZ C00O5 
' 202 ^OS *Z00 



isvECAN. Henderson 
^AR.^BC>«^ Garrett 
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y 



5' TAGGGATAACAGGGTAAT 3' 



3' ATCCCTATTGTCCCATTA 5' 
t 



- 20 - 

and- polymerization with the DNA polymerase. Amplified se- 
quences can be detected by the use of a technique termed oli- 
gomer restriction (OR). See, R. K. Saiki et al.. Bio/ 
Technology 3:1008-1012 (1985). 

The enzyme I-5ceI is one of a number of endonucleases with 
similar properties. Following is a listing of related 
enzymes and their sources . 

Group I intron encoded endonucleases and related enzymes 
are listed below with references. Recognition sites are 
shown in Fig. 6. 



Enzyme 



Encoded by 



Ref 



• INS c CAN. Henderson' 

aro I sTscr* s w 
.viS- %oroN zz zoooi 

! 202 *03 ^ZZZ 



I - 5cel 
I-Scell 

I-5ceIII 

I - ScelV 

I - Ceul 

I - Crel 
I-Ppol 

I-TevI 



I-revII 

I-TevIII 

HO 

Endo Scel 



Sc LSU-1 intron 
Sc coxl-4 intron 

Sc coxl-3 intron 

Sc coxl-5a intron 

Ce LSU-5 intron 

Cr LSU-1 intron 
Pp LSU-3 intron 

T4 td-1 intron 



T4 sunY intron 
RB3 nrdB-l intron 
HO yeast gene 
RF3 yeast mito. gene 



this work 

Sargueil et al., NAR 

(1990) 18, 5659-5665 
Sargueil et al . , MGG 

(1991) 225, 340-341 
Seraphin et al. (1992) 
in press 

Marshall, Lemiexax Gene 
(1991) 104, 241-245 
Rochaix (unpublished) 
Muscarella et al . , MCE 
(1990) 10, 3386-3396 
Chu et al., PNAS (1990) 
87, 3574-3578 and Bell- 
Pedersen et al. NAR 
(1990) 18, 3763-3770. 
Bell-Pedersen et al. NAR 

(1990) 18, 3763-3770. 
Eddy, Gold, Genes Dev. 

(1991) 5, 1032-1041 
Nickoloff et al., MCB 

(1990) 10, 1174-1179 
Kawasaki et al . , JBC 

(1991) 266, 5342-5347 ' 



Putative new enzymes (genetic evidence but no activity 
as yet) are I-Csml from cytochrome b intron 1 of Chlamydomo - 
nas smithii mitochondria (ref. 15), I-Paj3l from cytochrome b 
intron 3 of Podospora anserina mitochondria (Jill Salvo) , and 
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probably enzymes encoded by introns Nc ndl*l and Nc cob'! 
from Neurospora crassa. 

The I-endonucleases can be classified as follows: 



INN EGAS. Henderson 
FAR.A30v3t'. Garrett 
& Dinner 

:300 I STacrr. n w 
.^iSMtNOTCN. CC 2O0C5 
I 2C2 <0e -000 



Class I : Two dodecapeptide motifs, 4 bp staggered cut with 
3' OH overhangs, cut internal to recognition site 

Subclass "I - Scel " Other subclasses 



OH overhangs, cut external to recognition site: 
I-TevI 

Class III : no typical structural motifs, 4 bp staggered cut 
with 3' OH overhangs, cut internal to recognition site: 
I-Ppol 

Class IV : no typical structural motifs, 2 bp staggered cut 
with 3' OH overhangs, cut external to recognition site: 
I-revII 

Class V : no typical structural motifs, 2 bp staggered cut 
with 5' OH overhangs: 
I-TevIII, 



I -Scel 
I-ScelV 
I - CsmI 
I-PanI 



I-5ceII 
I-Scelll 

I-Ceul (only one dodecapeptide motif) 
I-Crel (only one dodecapeptide motif) 
HO 

TFPl-408 (HO homolog) 
Endo Scel 



Class II : GIY- (N 



10-11 



) YIG motif, 2 bp staggered cut with 3' 
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York (1982) . The nucleic acid will generally be obtained 
from a bacteria, yeast, virus, or a higher organism, such as 
a plant 'or animal- The nucleic acid can be a fraction of a 
more complex mixture, such as a portion of a gene contained 
in whole human DNA or a portion of a nucleic acid sequence of 
a particular microorganism. The nucleic acid can be a frac- 
tion of a larger molecule or the nucleic acid can constitute 
an entire gene or assembly of genes. The DNA can be in a 
single- stranded or double -stranded form. If the fragment is 
in single- stranded form, it can be converted to double- 
stranded form using DNA polymerase according to conventional 
techniques . 

The DNA sequence of the invention can be linked to a 
structural gene. As used herein, the term "structural gene" 
refers to a DNA sequence that encodes through its template or 
messenger mRNA a sequence of amino acids characteristic of a 
specific protein or polypeptide. The nucleotide sequence of 
the invention can function with an expression control se- 
quence, that is, a DNA sequence that controls and regulates 
expression of the gene when operatively linked to the gene. 



4. Vectors Containing the Nucleotide 
Sequence of the Invention 

This invention also relates to cloning and expression 
vectors containing the DNA sequence of the invention coding 
'■♦'-"="•'=" for the enzyme I-Scel. 

iSSECAS. HENDcRiON 
FaR-^BOVX'. Carrett 

6 D-.-NSER More particularly, the DNA sequence encoding the enzyme 

t 3 'PEC*. N. w 

''^''ioz'-oi'lz^'r'' ligated to a vehicle for cloning the sequence. The 
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■ Examples of yeast expression vectors are: 
pPEX7, which is a yeast expression vector derived from 

pRP51-Bam O (a LEU2d derivative of pLG-SD5) by insertion 
of the synthetic gene under the control of the galactose 
promoter. Expression is induced by galactose. 
pPEX408, which is a yeast expression vector derived from 
pLG-SDS by insertion of the synthetic gene under the 
control of the galactose promoter. Expression is in- 
duced by galactose. 
Several yeast expression vectors are depicted in Fig. 7. 

Typical mammalian expression vectors are: 
pRSV I-Scel, which is a pRSV derivative in which the 

synthetic gene (BamHI - Pstl fragment from pSCM525) is 
under the control of the LTR promoter of Rous Sarcoma 
Virus. This expression vector is depicted in Fig. 8. 
Vectors for expression in Chinese Hamster Ovary (CHO) cells 
can also be employed. 

5. Cells Transforme d with VPctora of the Invention 

The vectors of the invention can be inserted into host 
organisms using conventional techniques. For example, the 
vectors can be inserted by transformation, transf ection, 
electroporation, microinjection, or by means of liposomes 
(lipofection) . 

Cloning can be carried out in prokaryotic or eukaryotic 
cells. The host for replicating the cloning vehicle will of 
course be one that is compatible with the vehicle and in 



hybridization between the probe pUKG066 and chromosome III, 
probably due to some repetitive DNA sequences. 

All chromosome fragments, taken together, now define 
physical intervals as indicated in Fig, I5d. The I-Scel map 
obtained has an 80 kb average resolution. 

Example 2 : Application of the Nested Chromosomal 
Fragmentation Strategy to the Mapping of Yeast Artificial 
Chromosome (YAC) Clones 

This strategy can be applied to YAC mapping with two 
possibilities • 

-1- insertion of the I-Scel site within the gene of 
interest using homologous recombination in yeast. This per- 
mits mapping of that gene in the YAC insert by I-5ceI diges- 
tion in vitro. This has been done and works. 

-2- random integration of I-Scel sites along the YAC 
insert by homologous recombination in yeast using highly re- 
petitive sequences (e.g., B2 in mouse or Alu in human). 
Transgenic strains are then used as described in ref . PI to 
sort libraries or map genes. 

The procedure has now been extended to YAC containing 
450 kb of Mouse DNA. To this end, a repeated sequence of 
mouse DNA (called B2) has been inserted in a plasmid 
containing the I-Scel site and a selectable yeast marker 
(LYS2) . Transformation of the yeast cells containing the 
recombinant YAC with the plasmid linearized within the B2 
sequence resulted in the integration of the I -Seal site at 
five different locations distributed along the mouse DNA 
insert. Cleavage at the inserted I-5ceI sites using the 
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under the control of an inducible promoter can be 
constructed. 

2 . Effects of site specific double strand breaks in 
yeast (refs. 18 and P4) 

Effects on plasmid-borne I-Scel sites: 

Intramolecular effects are described in detail in 

Ref. 18. Intermolecular (plasmid to chromosome) 

recombination can be predicted. 

siraj Co^j mark CsKflwli, t\o\ Wave keen 
Effects on chromyg^some integrated I-Scel sites P*"'*^"^.^ . n ^ 

In a haploid cell,Qa single break within a chromosome at 
an artificial I-Scel site results in cell division arrest 
followed by death (only a few % of survival) . Presence of an 
intact sequence homologous to the cut site results in repair 
and 100% cell survival. In a diploid cell, a single break 
within a chromosome at an artificial I-Scel site results in 
repair using the chromosome homolog and 100% cell survival. 
In both cases, repair of the induced double strand break re- 
sults in loss of heterozygosity with deletion of the non ho- 
mologous sequences flanking the cut and insertion of the non 
homologous sequences from the donor DNA molecule. 

3 . Application for in vivo recombination YACs in Yeast 

Construction of a YAC vector with the I-Scel restriction 
site next to the cloning site should permit one to induce 

:s^•EC^N. Hende-uon 

FAfL^30vx'. Garrett i_ ^ , . , . 

& D'.-NSER homologous recombination with another YAC if inserts are 
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TTTCACACAG GAAACAGCTA TGACCATGAT TACGAATTCT CATGTTTGAC AGCTTATCAT - 600 

CGATAAGCTT TAATGCGGTA GTTTATCACA GTTAAATTGC TAACGCAGTC AGGCACCGTG 660 

TATGAAATCT AACAATGCGC TCATCGTCAT CCTCGGCACC GTCACCCTGG ATGCTGTAGG 720 

CATAGGCTTG GTTATGCCGG TACTGCCGGG CCTCTTGCGG GATATCCGCC TGATGCGTGA 780 

ACGTGACGGA CGTAACCACC GCGACATGTG TGTGCTGTTC CGCTGGGCAT GCCAGGACAA 84 0 

CTTCTGGTCC GGTAACGTGC TGAGCCCGGC CAAGCTTACT CCCCATCCCC CTGTTGACAA 900 

TTAATCATCG GCTCGTATAA TGTGTGGAAT TGTGAGCGGA TAACAATTTC ACACAGGAAA 960 

CAGGATCCAT GCATATGAAA AACATCAAAA AAAACCAGGT AATGAACCTG GGTCCGAACT 102 0 

CTAAACTGCT GAAAGAATAC AAATCCCAGC TGATCGAACT GAACATCGAA CAGTTCGAAG 1080 

CAGGTATCGG TCTGATCCTG GGTGATGCTT ACATCCGTTC TCGTGATGAA GGTAAAACCT 1140 

ACTGTATGCA GTTCGAGTGG AAAAACAAAG CATACATGGA CCACGTATGT CTGCTGTACG 1200 

ATCAGTGGGT ACTGTCCCCG CCGCACAAAA AAGAACGTGT TAACCACCTG GGTAACCTGG 1260 

TAATCACCTG GGGCGCCCAG ACTTTCAAAC ACCAAGCTTT CAACAAACTG GCTAACCTGT 1320 

TCATCGTTAA CAACAAAAAA ACCATCCCGA ACAACCTGGT TGAAAACTAC CTGACCCCGA 13 80 

TGTCTCTGGC ATACTGGTTC ATGGATGATG GTGGTAAATG GGATTACAAC AAAAACTCTA 144 0 

CCAACAAATC GATCGTACTG AACACCCAGT CTTTCACTTT CGAAGAAGTA GAATACCTGG 1500 

TTAAGGGTCT GCGTAACAAA TTCCAACTGA ACTGTTACGT AAAAATCAAC AAAAACAAAG 1560 

CGATCATCTA CATCGATTCT ATGTCTTACC TGATCTTCTA CAACCTGATC AAACCGTACC 1620 

TCATCCCCCA GATGATGTAC AAACTGCCGA ACACTATCTC CTCCGAAACT TTCCTGAAAT 1680 

AATAAGTCGA CCTGCAGCCC AAGCTTGGCA CTGGCCGTCG TTTTACAACG TCGTGACT 173 8 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
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What is claimed is: 

1. An isolated DNA encoding the enzyme I-Scel, wherein 
the DNA has the nucleotide squence: * 



2671 
13 

2731 
33 

2791 
53 

2851 
73 

2911 
93 

2971 
. 113 

3031 
133 

3091 
153 

3151 
173 

3211 
193 

3271 
213 

3331 
233 



AAC CTC 
N L 

ATC GAA 
I E 

GAT GAA 

0 E 

GTA TGT 

V C 

CAC CTC 
B L 

AAA CTG 
K L 

AAC TAC 

N Y 

TAC AAC 

Y N 

GAA GTA 

r V 

ATC AAC 

1 N 

CTG ATC 
L I 

GAA ACT 
E T 



GOT CCG 
G P 

CAG TTC 

Q r 

GOT AAA 
G K 

CTG CTG 
L L , 

GGT AAC 
G N 

GCT AAC 
A H 

CTG ACC 
L T 

AAA AAC 
K N 

GAA TAC 
E T 

AAA AAC 

K M 

AAA CCG 
K P 

TTC CTG 
F L 



AAC TCT 
N S 

GAA GCA 
E A 

ACC TAC 
T Y 

TAC GAT 
Y D 

CTG GTA 
L V 

CTG TTC 

L r 

CCG ATG 
P M 

TCT ACC 
S T 

CTG GTT 
L V 

AAA CCG 
K P 

TAC CTG 
T L 

AAA TAA 
K * 



AAA CTG 
K L 

GGT ATC 
G I 

TGT ATG 
C M 

CAG TGG 
Q W 

ATC ACC 
I T 

ATC GTT 
I V 

TCT CTG 
S L 

AAC AAA 

N K 

AAG GOT 
K G 

ATC ATC 
I I 

ATC CCG 
I P 



ATG CAT ATG 
K 8 M 

CTG AAA GAA 
L K E 

GGT CTG ATC 
G L I 

CAC TTC GAG 
Q P E 

GTA CTG TCC 
V L S 

TGG GGC GCC 
H G A 

AAC AAC AAA 
N N K 

GCA TAC TGG 
A T W 

TCC ATC GTA 
S I V 

CTG CCT AAC 
L R N 

TAC ATC GAT 
T I D 

CAC ATG ATG 
Q K M 



AAA AAC ATC AAA AAA AAC CAG GTA ATG 2 670 
KMrKKN0VMl2 

TAC AAA TCC CAG CTG ATC GAA CTG AAC 2730 
YKSQLIELH32 

CTG GGT GAT GCT TAC ATC CCT TCT CGT 2T9-: 
LGDAYIRSR52 

TGG AAA AAC AAA GCA TAC ATG CAC CAC 2353 
WKNlCAYHDa72 

CCG CCG CAC AAA AAA GAA CGT GTT AAC 2910 
PPaRKERVH92 

CAC ACT TTC AAA CAC CAA GCT TTC AAC 2970 
QTPKHQAPN 112 

AAA ACC ATC CCG AAC AAC CTG GTT GAA 30 30 
K T 'I P H N L V E 132 

TTC ATG GAT GAT CGT CCT AAA TGG GAT 3090 
FMDDGCKWD 152 

CTG AAC ACC CAC TCT TTC ACT TTC GAA 3150 
LNTQSrTPE 172 

AAA TTC CAA CTG AAC TGT TAC GTA AAA 3210 
KFQLNCYVK192 

TCT ATG TCT TAC CTG ATC TTC TAC AAC 3 2 70 
SMSYLIFYN212 

TAC AAA CTG CCG AAC ACT ATC TCC TCC 3 3 30 
TKLPNTISS 232 
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2. DNA comprising the nucleotide sequence as claimed in 
claim 1 operatively linked to a promoter. 

3. An isolated RNA sequence complementary to the 
nucleotide sequence of claim 1. 
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(f) detecting the hybridization banding patterns; and 

(g) mapping said eukaryotic genome based on the 
hybridization banding patterns observed in step (f ) . 

11. The method of claim 10, wherein said eukaryotic 
genome is the yeast genome. 

12. The method of claim 10, wherein said eukaryotic 
genome is the genome of the yeast artificial chromosome 
vector (YAC) . 

13. The method of claim 10, wherein said step of 
artificially inserting one or more I-Scel sites comprises 
random insertion. 

14. The method of claim 10, wherein said step of 
artificially inserting 'one or more I-Scel sites comprises 
homologous recombination . 

15. The method of claim 11, wherein the probe of step 
(e) is derived from a cosmid clone, pUKG040. 

16. The method of claim 11, wherein the probe of step 
(e). is derived from a cosmid clone, pUKG066. 

17. The method of claim 10, wherein the nested 
chromosomal fragments of step (c) are used as hybridization 
probes to sort cosmid libraries. 

18. The method of claim 10, wherein after step (b) , the 
genome is partially digested with bacterial restriction 
enzymes of choice and then electrophoresed, as in step (c) , 
with size calibration markers. 

19. A method for in vivo site directed genetic 
recombination in an organism using enzyme I-5ceI, comprising 
the steps of: 



